Bay Area Bike Share is a company that provides on-demand bike rentals for customers in San Francisco, Redwood City, Palo Alto, Mountain View, and San Jose. The bikes can be unlocked from one station and returned to any other station in the system, making them ideal for one-way trips. People use bike share to commute to work or school, run errands, get to appointments or social engagements and more. It’s a fun, convenient and affordable way to get around. Users pay for the service either through a yearly subscription or by purchasing 3-day or 24-hour passes. Users can make an unlimited number of trips, with trips under thirty minutes in length having no additional charge; longer trips will incur overtime fees.
Let’s get a quick peek at the trips and stations dataset. The trips dataset contains information about trip id, duration of the trip, start station, end station, etc. The stations dataset gives us spatial information, dock count, etc. for each station. We merge the trips and stations dataset in order to map the latitude and longitude data to start and end stations in the trips dataset. Moreover, we create the columns start_month, start_hour, start_day, weekday, etc. to help us dig deeper in the dataset.
trips_dataset <- read.csv("data/trips_dataset.csv")
Table(trips_dataset,limit = 20,scroll = T)
| trip_id | duration_sec | start_date | start_station_name | start_station_id | end_date | end_station_name | end_station_id | bike_number | zip_code | subscriber_type |
|---|---|---|---|---|---|---|---|---|---|---|
| 944732 | 2618 | 2015-09-24 17:22:00 | Mezes | 83 | 2015-09-24 18:06:00 | Mezes | 83 | 653 | 94063 | Customer |
| 1030383 | 5780 | 2015-12-06 10:52:00 | Mezes | 83 | 2015-12-06 12:28:00 | Mezes | 83 | 44 | 94064 | Customer |
| 1102641 | 801 | 2016-02-23 12:25:00 | Mezes | 83 | 2016-02-23 12:39:00 | Mezes | 83 | 174 | 93292 | Customer |
| 1102613 | 1334 | 2016-02-23 12:00:00 | San Mateo County Center | 23 | 2016-02-23 12:22:00 | Mezes | 83 | 174 | 93292 | Customer |
| 1114916 | 1642 | 2016-03-03 21:54:00 | Redwood City Public Library | 24 | 2016-03-03 22:22:00 | Mezes | 83 | 208 | 94063 | Customer |
| 1114919 | 1377 | 2016-03-03 22:00:00 | Redwood City Public Library | 24 | 2016-03-03 22:23:00 | Mezes | 83 | 52 | 94063 | Customer |
| 1210164 | 232 | 2016-05-20 21:33:00 | Redwood City Caltrain Station | 22 | 2016-05-20 21:37:00 | Mezes | 83 | 685 | 94063 | Customer |
| 249715 | 1702 | 2014-04-15 20:17:00 | Japantown | 9 | 2014-04-15 20:45:00 | Japantown | 9 | 49 | 60305 | Customer |
| 871985 | 9699 | 2015-08-01 19:59:00 | Japantown | 9 | 2015-08-01 22:41:00 | Japantown | 9 | 488 | 95054 | Customer |
| 212536 | 872 | 2014-03-13 12:13:00 | Japantown | 9 | 2014-03-13 12:28:00 | Japantown | 9 | 672 | 95031 | Customer |
| 363634 | 1334 | 2014-07-14 23:40:00 | Japantown | 9 | 2014-07-15 00:02:00 | Japantown | 9 | 112 | 95131 | Customer |
| 252213 | 6320 | 2014-04-17 15:09:00 | Japantown | 9 | 2014-04-17 16:54:00 | Japantown | 9 | 659 | 94587 | Customer |
| 150636 | 545 | 2014-01-12 15:08:00 | Japantown | 9 | 2014-01-12 15:18:00 | Japantown | 9 | 80 | 95148 | Customer |
| 363633 | 1361 | 2014-07-14 23:39:00 | Japantown | 9 | 2014-07-15 00:02:00 | Japantown | 9 | 12 | 95121 | Customer |
| 163424 | 3813 | 2014-01-23 13:41:00 | Japantown | 9 | 2014-01-23 14:44:00 | Japantown | 9 | 59 | 95158 | Customer |
| 252212 | 6324 | 2014-04-17 15:08:00 | Japantown | 9 | 2014-04-17 16:54:00 | Japantown | 9 | 176 | 94587 | Customer |
| 1279192 | 320 | 2016-07-16 09:49:00 | Japantown | 9 | 2016-07-16 09:55:00 | Japantown | 9 | 64 | 95132 | Customer |
| 871977 | 10162 | 2015-08-01 19:51:00 | Japantown | 9 | 2015-08-01 22:40:00 | Japantown | 9 | 130 | 95051 | Customer |
| 158467 | 1392 | 2014-01-18 18:43:00 | Japantown | 9 | 2014-01-18 19:07:00 | Japantown | 9 | 159 | 94564 | Customer |
| 881594 | 2775 | 2015-08-08 16:37:00 | Japantown | 9 | 2015-08-08 17:23:00 | Japantown | 9 | 104 | 95136 | Customer |
stations_dataset <- read.csv("data/stations_dataset.csv")
Table(stations_dataset,limit = 20,scroll = T)
| station_id | name | latitude | longitude | dockcount | landmark | installation_date |
|---|---|---|---|---|---|---|
| 4 | Santa Clara at Almaden | 37.33399 | -121.8949 | 11 | San Jose | 2013-08-06 |
| 84 | Ryland Park | 37.34273 | -121.8956 | 15 | San Jose | 2014-04-09 |
| 8 | San Salvador at 1st | 37.33017 | -121.8858 | 15 | San Jose | 2013-08-05 |
| 9 | Japantown | 37.34874 | -121.8947 | 15 | San Jose | 2013-08-05 |
| 3 | San Jose Civic Center | 37.33070 | -121.8890 | 15 | San Jose | 2013-08-05 |
| 13 | St James Park | 37.33930 | -121.8899 | 15 | San Jose | 2013-08-06 |
| 10 | San Jose City Hall | 37.33739 | -121.8870 | 15 | San Jose | 2013-08-06 |
| 16 | SJSU - San Salvador at 9th | 37.33396 | -121.8773 | 15 | San Jose | 2013-08-07 |
| 7 | Paseo de San Antonio | 37.33380 | -121.8869 | 15 | San Jose | 2013-08-07 |
| 6 | San Pedro Square | 37.33672 | -121.8941 | 15 | San Jose | 2013-08-07 |
| 80 | Santa Clara County Civic Center | 37.35260 | -121.9057 | 15 | San Jose | 2013-12-31 |
| 14 | Arena Green / SAP Center | 37.33269 | -121.9001 | 19 | San Jose | 2013-08-05 |
| 5 | Adobe on Almaden | 37.33141 | -121.8932 | 19 | San Jose | 2013-08-05 |
| 11 | MLK Library | 37.33588 | -121.8857 | 19 | San Jose | 2013-08-06 |
| 12 | SJSU 4th at San Carlos | 37.33281 | -121.8839 | 19 | San Jose | 2013-08-07 |
| 89 | S. Market st at Park Ave | 37.33240 | -121.8904 | 19 | San Jose | 2016-06-05 |
| 88 | 5th S. at E. San Salvador St | 37.33196 | -121.8816 | 19 | San Jose | 2016-06-05 |
| 2 | San Jose Diridon Caltrain Station | 37.32973 | -121.9018 | 27 | San Jose | 2013-08-06 |
| 37 | Cowper at University | 37.44860 | -122.1595 | 11 | Palo Alto | 2013-08-14 |
| 35 | University and Emerson | 37.44452 | -122.1631 | 11 | Palo Alto | 2013-08-15 |
## Get latitude-longitude and landmark details for start station and end station
trips_stations_dataset <- trips_dataset %>% left_join(stations_dataset,by = c("start_station_id"="station_id")) %>% mutate(start_station_latitude = latitude, start_station_longitude = longitude,start_station_landmark = landmark) %>% select(-c(latitude,longitude,dockcount,installation_date,name,landmark)) %>% left_join(stations_dataset,by = c("end_station_id"="station_id")) %>% mutate(end_station_latitude = latitude, end_station_longitude = longitude, end_station_landmark = landmark) %>% select(-c(latitude,longitude,dockcount,installation_date,name,landmark))
## Create start_hour, start_month, start_year, start_day, weekday
trips_stations_dataset <- trips_stations_dataset %>% mutate(start_year = year(start_date), start_month = month(start_date,label = T), start_day = day(start_date), start_hour = hour(start_date), weekday = wday(start_date,label = T))
Table(trips_stations_dataset,limit = 20,scroll = T)
| trip_id | duration_sec | start_date | start_station_name | start_station_id | end_date | end_station_name | end_station_id | bike_number | zip_code | subscriber_type | start_station_latitude | start_station_longitude | start_station_landmark | end_station_latitude | end_station_longitude | end_station_landmark | start_year | start_month | start_day | start_hour | weekday |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 944732 | 2618 | 2015-09-24 17:22:00 | Mezes | 83 | 2015-09-24 18:06:00 | Mezes | 83 | 653 | 94063 | Customer | 37.49127 | -122.2362 | Redwood City | 37.49127 | -122.2362 | Redwood City | 2015 | Sep | 24 | 17 | Thu |
| 1030383 | 5780 | 2015-12-06 10:52:00 | Mezes | 83 | 2015-12-06 12:28:00 | Mezes | 83 | 44 | 94064 | Customer | 37.49127 | -122.2362 | Redwood City | 37.49127 | -122.2362 | Redwood City | 2015 | Dec | 6 | 10 | Sun |
| 1102641 | 801 | 2016-02-23 12:25:00 | Mezes | 83 | 2016-02-23 12:39:00 | Mezes | 83 | 174 | 93292 | Customer | 37.49127 | -122.2362 | Redwood City | 37.49127 | -122.2362 | Redwood City | 2016 | Feb | 23 | 12 | Tue |
| 1102613 | 1334 | 2016-02-23 12:00:00 | San Mateo County Center | 23 | 2016-02-23 12:22:00 | Mezes | 83 | 174 | 93292 | Customer | 37.48762 | -122.2300 | Redwood City | 37.49127 | -122.2362 | Redwood City | 2016 | Feb | 23 | 12 | Tue |
| 1114916 | 1642 | 2016-03-03 21:54:00 | Redwood City Public Library | 24 | 2016-03-03 22:22:00 | Mezes | 83 | 208 | 94063 | Customer | 37.48422 | -122.2274 | Redwood City | 37.49127 | -122.2362 | Redwood City | 2016 | Mar | 3 | 21 | Thu |
| 1114919 | 1377 | 2016-03-03 22:00:00 | Redwood City Public Library | 24 | 2016-03-03 22:23:00 | Mezes | 83 | 52 | 94063 | Customer | 37.48422 | -122.2274 | Redwood City | 37.49127 | -122.2362 | Redwood City | 2016 | Mar | 3 | 22 | Thu |
| 1210164 | 232 | 2016-05-20 21:33:00 | Redwood City Caltrain Station | 22 | 2016-05-20 21:37:00 | Mezes | 83 | 685 | 94063 | Customer | 37.48608 | -122.2321 | Redwood City | 37.49127 | -122.2362 | Redwood City | 2016 | May | 20 | 21 | Fri |
| 249715 | 1702 | 2014-04-15 20:17:00 | Japantown | 9 | 2014-04-15 20:45:00 | Japantown | 9 | 49 | 60305 | Customer | 37.34874 | -121.8947 | San Jose | 37.34874 | -121.8947 | San Jose | 2014 | Apr | 15 | 20 | Tue |
| 871985 | 9699 | 2015-08-01 19:59:00 | Japantown | 9 | 2015-08-01 22:41:00 | Japantown | 9 | 488 | 95054 | Customer | 37.34874 | -121.8947 | San Jose | 37.34874 | -121.8947 | San Jose | 2015 | Aug | 1 | 19 | Sat |
| 212536 | 872 | 2014-03-13 12:13:00 | Japantown | 9 | 2014-03-13 12:28:00 | Japantown | 9 | 672 | 95031 | Customer | 37.34874 | -121.8947 | San Jose | 37.34874 | -121.8947 | San Jose | 2014 | Mar | 13 | 12 | Thu |
| 363634 | 1334 | 2014-07-14 23:40:00 | Japantown | 9 | 2014-07-15 00:02:00 | Japantown | 9 | 112 | 95131 | Customer | 37.34874 | -121.8947 | San Jose | 37.34874 | -121.8947 | San Jose | 2014 | Jul | 14 | 23 | Mon |
| 252213 | 6320 | 2014-04-17 15:09:00 | Japantown | 9 | 2014-04-17 16:54:00 | Japantown | 9 | 659 | 94587 | Customer | 37.34874 | -121.8947 | San Jose | 37.34874 | -121.8947 | San Jose | 2014 | Apr | 17 | 15 | Thu |
| 150636 | 545 | 2014-01-12 15:08:00 | Japantown | 9 | 2014-01-12 15:18:00 | Japantown | 9 | 80 | 95148 | Customer | 37.34874 | -121.8947 | San Jose | 37.34874 | -121.8947 | San Jose | 2014 | Jan | 12 | 15 | Sun |
| 363633 | 1361 | 2014-07-14 23:39:00 | Japantown | 9 | 2014-07-15 00:02:00 | Japantown | 9 | 12 | 95121 | Customer | 37.34874 | -121.8947 | San Jose | 37.34874 | -121.8947 | San Jose | 2014 | Jul | 14 | 23 | Mon |
| 163424 | 3813 | 2014-01-23 13:41:00 | Japantown | 9 | 2014-01-23 14:44:00 | Japantown | 9 | 59 | 95158 | Customer | 37.34874 | -121.8947 | San Jose | 37.34874 | -121.8947 | San Jose | 2014 | Jan | 23 | 13 | Thu |
| 252212 | 6324 | 2014-04-17 15:08:00 | Japantown | 9 | 2014-04-17 16:54:00 | Japantown | 9 | 176 | 94587 | Customer | 37.34874 | -121.8947 | San Jose | 37.34874 | -121.8947 | San Jose | 2014 | Apr | 17 | 15 | Thu |
| 1279192 | 320 | 2016-07-16 09:49:00 | Japantown | 9 | 2016-07-16 09:55:00 | Japantown | 9 | 64 | 95132 | Customer | 37.34874 | -121.8947 | San Jose | 37.34874 | -121.8947 | San Jose | 2016 | Jul | 16 | 9 | Sat |
| 871977 | 10162 | 2015-08-01 19:51:00 | Japantown | 9 | 2015-08-01 22:40:00 | Japantown | 9 | 130 | 95051 | Customer | 37.34874 | -121.8947 | San Jose | 37.34874 | -121.8947 | San Jose | 2015 | Aug | 1 | 19 | Sat |
| 158467 | 1392 | 2014-01-18 18:43:00 | Japantown | 9 | 2014-01-18 19:07:00 | Japantown | 9 | 159 | 94564 | Customer | 37.34874 | -121.8947 | San Jose | 37.34874 | -121.8947 | San Jose | 2014 | Jan | 18 | 18 | Sat |
| 881594 | 2775 | 2015-08-08 16:37:00 | Japantown | 9 | 2015-08-08 17:23:00 | Japantown | 9 | 104 | 95136 | Customer | 37.34874 | -121.8947 | San Jose | 37.34874 | -121.8947 | San Jose | 2015 | Aug | 8 | 16 | Sat |
Let’s look at how the trips are divided by subscription type.
# Group by subscriber type and count number of trips.
trips_by_subscription <- trips_stations_dataset %>% group_by(subscriber_type) %>% summarise(total_trips = n())
plot_trips_by_subscription <- ggplot(data = trips_by_subscription) + geom_col(mapping = aes(x=subscriber_type,y=total_trips,fill= subscriber_type)) + xlab("Subscriber Type") + ylab("No of Trips") + ggtitle("Trips by Subscription") + geom_text(aes(x = subscriber_type,y = total_trips,label = total_trips),position = position_dodge(width = 0.9),vjust = -0.25) + scale_y_continuous(labels = scales::comma) + guides(fill = guide_legend(title = "Subscriber Type"))
plot_trips_by_subscription
It can be seen in the plot above that the number of trips by subscribers are a lot more than the customers.
In the dataset, 86.1% of the rides are taken by the subsribers whereas only 13.9% of the rides are taken by the customers.
The bike sharing service is available throughout the year. Let’s look at the total numbers of rides at a monthly level and see if we can find any interesting pattern.
trips_by_month <- trips_stations_dataset %>% group_by(start_month,subscriber_type) %>% summarise(total_trips = n())
plot_trips_by_month <- ggplot(data = trips_by_month) + geom_col(mapping = aes(x=start_month,y=total_trips,fill = subscriber_type),position = position_dodge2()) + xlab("Month") + ylab("No of Trips") + ggtitle("Trips by Month") + scale_y_continuous(labels = scales::comma) + guides(fill = guide_legend(title = "Subscriber Type"))
plot_trips_by_month
More number of trips are taken by subscribers from March to October as compared to November to February. The dip in December must be due to less usage of bikes as users are on vacation.
The weather from Nov-Feb is relatively cold. Some of the users must be using alternative mode of transporation in cold weather.
Similary, customers have taken more trips from March to October than from November to February.
Now we will group the rides by weekday and analyze the usage pattern of subsribers and consumers over weekday as well as weekend.
trips_by_weekday <- trips_stations_dataset %>% group_by(weekday,subscriber_type) %>% summarise(total_trips = n())
plot_trips_by_weekday <- ggplot(data = trips_by_weekday) + geom_col(mapping = aes(x=weekday,y=total_trips,fill = subscriber_type),position = position_dodge2()) + xlab("Weekday") + ylab("No of Trips") + ggtitle("Trips by Weekday") + scale_y_continuous(labels = scales::comma) + guides(fill = guide_legend(title = "Subscriber Type"))
plot_trips_by_weekday
The subscribers show a high usage pattern on weekdays. This must be due to the subsribers using the bikes for day-to-day commute, going for lunch,dinner, etc.
The customers show low usage over weekdays . However, over the weekends, the number of trips by the customers is almost equal to subscribers. This must be due to more customers using bikes over weekend for leisure purpose.
Now we will group the trips by hour and observe the hourly usage of subcribers and consumers.
trips_by_hour <- trips_stations_dataset %>% group_by(start_hour,subscriber_type) %>% summarise(total_trips = n())
plot_trips_by_hour <- ggplot(data = trips_by_hour) + geom_col(mapping = aes(x=start_hour,y=total_trips,fill = subscriber_type),position = position_dodge2()) + xlab("Hour") + ylab("No of Trips") + ggtitle("Trips by Hour") + scale_y_continuous(labels = scales::comma) + guides(fill = guide_legend(title = "Subscriber Type"))
plot_trips_by_hour
The subscribers hourly usage follows a bimodal distributions. It can be seen that the usage peaks at 8AM and 5PM. This must be due to the subsribers riding the bike to work in the morning and returning home in the evening.
The customers hourly usage follows a normal distribution. The customers show high usage from 12PM to 4 PM.
Let’s analyze how the ride usage changes by hour and weekday. We will group the trips by hour and weekday and overlay the number of trips as heatmap.
trips_hour_weekday <-trips_stations_dataset %>% group_by(weekday,start_hour,subscriber_type) %>% summarise(total_trips = n()) %>% ungroup() %>% mutate( subscriber_type = factor( subscriber_type))
plot_trips_hour_weekday <- trips_hour_weekday %>% ggplot(aes(x = start_hour,y = weekday,fill = total_trips)) + geom_tile(color="white",size = 0.1) + coord_equal() + facet_wrap( ~subscriber_type, ncol = 1 ) + scale_fill_viridis(name = "Total Trips", label = comma)+
labs(x = "Hour", y = "Weekday", title = "Trips by Hour and Weekday") + theme_tufte(base_family = "Helvetica") + theme( axis.ticks = element_blank(),
plot.title = element_text(hjust = 0.5),
legend.title = element_text(size = 8),
legend.text = element_text(size = 6) )
plot_trips_hour_weekday
In the above heatmap for subscribers, we can see that there is high usage on weekdays from 7-9 AM and 4-6PM. This must be due to subscribers using the bike for work commute.
We can also see moderate usage by subscribers at 12PM. This must be due to users going for lunch on bikes.
Customers tend to use bikes more on weekends between 11AM to 4PM. They must be using the bike for leisure purpose.
As stated above, the average duration of the trip is 17 minutes. Let’s observe the usage duration distribution for customers and subscribers.
trips_by_duration <- trips_stations_dataset %>% mutate(duration_minutes = ifelse((duration_sec/60.0)>60,60,duration_sec/60.0))
plot_trips_by_duration_subscriber <- trips_by_duration %>% filter(subscriber_type=="Subscriber") %>% ggplot() + geom_histogram(mapping = aes(duration_minutes),fill = "blue",bins = 50) + scale_y_continuous(labels = scales::comma) + labs(x = "Duration (minutes)", y = "No of Trips")
plot_trips_by_duration_customer <- trips_by_duration %>% filter(subscriber_type=="Customer") %>% ggplot() + geom_histogram(mapping = aes(duration_minutes),fill = "blue",bins = 50) + scale_y_continuous(labels = scales::comma) + labs(x = "Duration (minutes)", y = "No of Trips")
cowplot::plot_grid(plot_trips_by_duration_customer,plot_trips_by_duration_subscriber,labels = c("Customer","Subscriber"),label_size = 12,hjust = -1.5)
The usage duration distribution for riders for duration less than 60 minutes peaks around 8-10 minutes.
However, most of the trips taken by the customers have duration more than 60 minutes. This must be due to customers hiring the bike for shopping, or leisue ride around the city,
Most of trips taken by the subscriber last less than half an hour. This must be due to subscribers using the bike for specific purpose like going to office, market, etc.
Let’s look top 10 popular origin and destination stations.
trips_start_by_station <- trips_stations_dataset %>% group_by(start_station_name,start_station_latitude,start_station_longitude) %>% summarise(total_trips = n()) %>% ungroup %>% mutate(total_trips_normalized = scales::rescale(total_trips))
top_10_popular_origin_stations <- trips_start_by_station %>% dplyr::select(-c(total_trips_normalized,start_station_latitude,start_station_longitude)) %>% arrange(desc(total_trips)) %>% head(10)
Table(top_10_popular_origin_stations,scroll = T)
| start_station_name | total_trips |
|---|---|
| San Francisco Caltrain (Townsend at 4th) | 72683 |
| San Francisco Caltrain 2 (330 Townsend) | 56100 |
| Harry Bridges Plaza (Ferry Building) | 49062 |
| Embarcadero at Sansome | 41137 |
| 2nd at Townsend | 39936 |
| Temporary Transbay Terminal (Howard at Beale) | 39200 |
| Steuart at Market | 38531 |
| Market at Sansome | 35142 |
| Townsend at 7th | 34894 |
| Market at 10th | 30209 |
trips_end_by_station <- trips_stations_dataset %>% group_by(end_station_name, end_station_latitude,end_station_longitude) %>% summarise(total_trips = n()) %>% ungroup %>% mutate(total_trips_normalized = scales::rescale(total_trips))
top_10_popular_destination_stations <- trips_end_by_station %>% dplyr::select(-c(total_trips_normalized,end_station_latitude,end_station_longitude)) %>% arrange(desc(total_trips)) %>% head(10)
Table(top_10_popular_destination_stations,scroll = T)
| end_station_name | total_trips |
|---|---|
| San Francisco Caltrain (Townsend at 4th) | 92014 |
| San Francisco Caltrain 2 (330 Townsend) | 58713 |
| Harry Bridges Plaza (Ferry Building) | 50185 |
| Embarcadero at Sansome | 46197 |
| 2nd at Townsend | 44145 |
| Market at Sansome | 40956 |
| Steuart at Market | 39598 |
| Townsend at 7th | 38545 |
| Temporary Transbay Terminal (Howard at Beale) | 35477 |
| Market at 4th | 26762 |
Now we will analyze the density heatmap of origin and destination stations. This will help us visualize popular bike pick up and drop stations.
First, let us look at the heatmap of origin station.
plot_trips_start_by_station <- leaflet(trips_start_by_station) %>% addProviderTiles(providers$OpenStreetMap) %>% addScaleBar() %>% addHeatmap(lat = ~start_station_latitude, lng = ~start_station_longitude, intensity = ~total_trips_normalized, minOpacity = 0.1, max = 0.7)
plot_trips_start_by_station
The heatmap above shows that more number of trips start in San Fransisco followed by San Jose, Mountain View, Palo Alto, Redwood City.
Zooming in further into the heatmap shows San Francisco Caltrain (Townsend at 4th) and San Francisco Caltrain 2 (330 Townsend) as the most popular starting stations. Both the stations are outside the Caltrain station. Hence, a lot of users getting out of the station must be using it.
Now let us look at the heatmap for destination stations.
plot_trips_end_by_station <- leaflet(trips_end_by_station) %>% addProviderTiles(providers$OpenStreetMap) %>% addScaleBar() %>% addHeatmap(lat = ~end_station_latitude, lng = ~end_station_longitude, intensity = ~total_trips_normalized, minOpacity = 0.1, max = 0.7)
plot_trips_end_by_station
The popular destination stations are also located in San Fransico as seen in the heatmap above.
Here also, the popular destination stations are San Francisco Caltrain (Townsend at 4th) and San Francisco Caltrain 2 (330 Townsend). This must be due to riders parking the bikes here before boarding the train.
Now we analyze the inflow and outflow of bikes per station. This will help us to address the bike distribution problem.
The dots with red hue shows the stations which have more bike pickups than drops.The dots with yello hue shows almost equal pickup and drops whereas the green dots show low bike pickups than drops. Clicking on the station gives us info about the station name, number of trips started and no of trips ended at the station.
trips_start_end_total <-
inner_join(
trips_start_by_station,
trips_end_by_station,
by = c("start_station_name" = "end_station_name"),
suffix = c("_started_here", "_ended_here")
) %>% dplyr::select(-c(end_station_latitude, end_station_longitude)) %>% mutate(diff_start_end = total_trips_ended_here - total_trips_started_here) %>% mutate(content = paste(
paste("Station Name: ", start_station_name),
paste("Trips Started here: ", total_trips_started_here),
paste("Trips Ended here: ", total_trips_ended_here),
sep = "<br/>"
))
palData <- classInt::classIntervals(trips_start_end_total$diff_start_end,style = "quantile")
colPalette <- RColorBrewer::brewer.pal(n = 9, name = "RdYlGn")
trips_start_end_total <- trips_start_end_total %>% ungroup %>% mutate(colors = classInt::findColours(palData,colPalette))
plot_trips_start_end_total <- leaflet(trips_start_end_total) %>% addProviderTiles(providers$CartoDB.DarkMatter) %>% setView(lng = -122.419416,lat = 37.774929,zoom = 12) %>% addCircles(lng = ~start_station_longitude,lat = ~start_station_latitude,opacity = 1,fillOpacity = 1,color = ~colors,stroke = T,popup = ~content,radius = 25)
plot_trips_start_end_total
It can be seen that the stations such as Grant Avenue at Columbus Avenue and 2nd at Folsom have significantly more trips starting. Whereas the stations such as San Francisco Caltrain (Townsend at 4th) and Market at Sansome have more trips ending there.
This can help us develop the system to transport the bikes from low traffic stations to high traffic stations.
Now lets us look at the most popular routes taken by the users. Here we show the top 10 most popular routes.
popular_routes <- trips_stations_dataset %>% group_by(start_station_name,start_station_latitude,start_station_longitude,end_station_name, end_station_latitude,end_station_longitude) %>% summarise(total_trips = n())
top_10_popular_routes <- popular_routes %>% ungroup() %>% arrange(desc(total_trips)) %>% head(10) %>% mutate("Popular Routes" = paste(start_station_name,end_station_name,sep=" to ")) %>% dplyr::select("Popular Routes",total_trips)
Table(top_10_popular_routes)
| Popular Routes | total_trips |
|---|---|
| Harry Bridges Plaza (Ferry Building) to Embarcadero at Sansome | 9150 |
| San Francisco Caltrain 2 (330 Townsend) to Townsend at 7th | 8508 |
| 2nd at Townsend to Harry Bridges Plaza (Ferry Building) | 7620 |
| Harry Bridges Plaza (Ferry Building) to 2nd at Townsend | 6888 |
| Embarcadero at Sansome to Steuart at Market | 6874 |
| Townsend at 7th to San Francisco Caltrain 2 (330 Townsend) | 6836 |
| Embarcadero at Folsom to San Francisco Caltrain (Townsend at 4th) | 6351 |
| San Francisco Caltrain (Townsend at 4th) to Harry Bridges Plaza (Ferry Building) | 6215 |
| Steuart at Market to 2nd at Townsend | 6039 |
| Steuart at Market to San Francisco Caltrain (Townsend at 4th) | 5959 |
Now let us visualize the top 500 popular routes. The thickness of the lines represents the number of trips taken on that routes. Clicking on the routes gives us information about the route and trips taken on that route.
top_500_popular_routes <- popular_routes %>% ungroup() %>% arrange(desc(total_trips)) %>% head(500) %>% mutate(content = paste(paste("Route: ",paste(start_station_name,end_station_name,sep="-")),paste("Total Trips: ",total_trips),sep = "<br/>"))
palData <- classInt::classIntervals(top_500_popular_routes$total_trips,style = "quantile")
colPalette <- RColorBrewer::brewer.pal(n = 9, name = "YlOrRd")
top_500_popular_routes <- top_500_popular_routes %>% ungroup %>% mutate(colors = classInt::findColours(palData,colPalette)) %>% mutate(total_trips_normalized = scales::rescale(total_trips))
plot_popular_routes <- leaflet(top_500_popular_routes) %>% addProviderTiles(providers$CartoDB.DarkMatter) %>% setView(lng = -122.419416,lat = 37.774929,zoom = 12)
for (i in nrow(top_500_popular_routes):1)
plot_popular_routes<-plot_popular_routes%>%addPolylines(lat=c(top_500_popular_routes[i,]$start_station_latitude,top_500_popular_routes[i,]$end_station_latitude),lng=c(top_500_popular_routes[i,]$start_station_longitude,top_500_popular_routes[i,]$end_station_longitude),color = "yellow",popup = top_500_popular_routes[i,]$content,weight = top_500_popular_routes[i,]$total_trips_normalized*5)
plot_popular_routes
Efficient bike transportation system should be put in place to transport bikes from low traffic stations to high traffic stations.
A model can be built to forecast bike demands at each station. This will help us optimize the bike deployment system and hence increase the revenue.
Discounts should be offered during weekends and non-peak hours. This will help increase the bike utilization and increase the revenue.
Marketing promotions should be targetted on regular customers to convert them into subscribers.
Surge pricing should be implemented based on the outflow and inflow at each station.